The MultiRank Bootstrap Algorithm: Semi-Supervised Political Blog Classification and Ranking Using Semi-Supervised Link Classification

نویسندگان

  • Frank Lin
  • William W. Cohen
چکیده

We present a new, intuitive semi-supervised learning algorithm for classifying political blogs in a blog network and ranking them within classes. In the algorithm each link is assigned a label as well as the blogs. Using only the link structure as input and by exploiting the linking properties found in political blog communities, we bootstrap the classification of links and blogs and blog rankings from a set of known seed blogs. We test our algorithm on two datasets and achieve blog classification accuracy of 81.9% in a network of 404 blogs and 84.6% in a network of 1222 blogs using only 2 seed blogs in each case. We analyze the results our algorithm and show that the misclassifications tend to be less important or less authoritative blogs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The MultiRank Bootstrap Algorithm: Self-Supervised Political Blog Classification and Ranking Using Semi-Supervised Link Classification

We present a new semi-supervised learning algorithm for classifying political blogs in a blog network and ranking them within predicted classes. We test our algorithm on two datasets and achieve classification accuracy of 81.9% and 84.6% using only 2 seed blogs.

متن کامل

Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk

This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...

متن کامل

Semi-Supervised Learning for Blog Classification

Blog classification (e.g., identifying bloggers’ gender or age) is one of the most interesting current problems in blog analysis. Although this problem is usually solved by applying supervised learning techniques, the large labeled dataset required for training is not always available. In contrast, unlabeled blogs can easily be collected from the web. Therefore, a semi-supervised learning metho...

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007